Fast Prediction for Large-Scale Kernel Machines
نویسندگان
چکیده
Kernel machines such as kernel SVM and kernel ridge regression usually construct high quality models; however, their use in real-world applications remains limited due to the high prediction cost. In this paper, we present two novel insights for improving the prediction efficiency of kernel machines. First, we show that by adding “pseudo landmark points” to the classical Nyström kernel approximation in an elegant way, we can significantly reduce the prediction error without much additional prediction cost. Second, we provide a new theoretical analysis on bounding the error of the solution computed by using Nyström kernel approximation method, and show that the error is related to the weighted kmeans objective function where the weights are given by the model computed from the original kernel. This theoretical insight suggests a new landmark point selection technique for the situation where we have knowledge of the original model. Based on these two insights, we provide a divide-and-conquer framework for improving the prediction speed. First, we divide the whole problem into smaller local subproblems to reduce the problem size. In the second phase, we develop a kernel approximation based fast prediction approach within each subproblem. We apply our algorithm to real world large-scale classification and regression datasets, and show that the proposed algorithm is consistently and significantly better than other competitors. For example, on the Covertype classification problem, in terms of prediction time, our algorithm achieves more than 10000 times speedup over the full kernel SVM, and a two-fold speedup over the state-of-the-art LDKL approach , while obtaining much higher prediction accuracy than LDKL (95.2% vs. 89.53%).
منابع مشابه
Computationally Efficient Nyström Approximation using Fast Transforms
Our goal is to improve the training and prediction time of Nyström method, which is a widely-used technique for generating low-rank kernel matrix approximations. When applying the Nyström approximation for large-scale applications, both training and prediction time is dominated by computing kernel values between a data point and all landmark points. With m landmark points, this computation requ...
متن کاملSeparating Well Log Data to Train Support Vector Machines for Lithology Prediction in a Heterogeneous Carbonate Reservoir
The prediction of lithology is necessary in all areas of petroleum engineering. This means that to design a project in any branch of petroleum engineering, the lithology must be well known. Support vector machines (SVM’s) use an analytical approach to classification based on statistical learning theory, the principles of structural risk minimization, and empirical risk minimization. In this res...
متن کاملScalable Dyadic Kernel Machines
In the dyadic data prediction (DDP) problem, we observe labeled pairs (dyads) drawn from a finite Cartesian product M × U and form predictions for the labels of unseen dyads. This results in a sparse, non-linear prediction problem, for which kernel machines, like the Support Vector Machine, are well suited. However, the release of the 100 million dyad Netflix dataset has brought the issue of DD...
متن کاملRandom Features for Large-Scale Kernel Machines
To accelerate the training of kernel machines, we propose to map the input data to a randomized low-dimensional feature space and then apply existing fast linear methods. The features are designed so that the inner products of the transformed data are approximately equal to those in the feature space of a user specified shiftinvariant kernel. We explore two sets of random features, provide conv...
متن کاملAdaptive Kernel Approximation for Large-Scale Non-Linear SVM Prediction
The applicability of non-linear support vector machines (SVMs) has been limited in largescale data collections because of their linear prediction complexity to the size of support vectors. We propose an efficient prediction algorithm with performance guarantee for non-linear SVMs, termed AdaptSVM. It can selectively collapse the kernel function computation to a reduced set of support vectors, c...
متن کامل